updates for 25.12 by eordentlich · Pull Request #997 · NVIDIA/spark-rapids-ml

eordentlich · 2025-12-09T02:51:34Z

mainly attempts to align with some breaking cuml changes, including

logistic regression
- training objective no longer computed in cuml (for consistency).
- single label values not trained in logistic regression.
- lbfgs params now need to be passed to contrustor.
some cuml model fields need to be set differently for some algos, at inference time.
kmeansmg is back
pylibraft and dask and related no longer default dependencies of cuml
scipy minimum is now 1.11 which raised some issues in databricks 13.3
Also fixes some dbscan tests that can fail in multi-gpu settings.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

…pat tests Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps · 2025-12-09T02:56:09Z

Greptile Overview

Greptile Summary

This PR updates the entire spark-rapids-ml codebase to align with cuML 25.12 nightly builds, addressing multiple breaking changes in the upstream RAPIDS ecosystem. The changes include comprehensive version updates across all components (Python package, JVM artifacts, Docker images, documentation) from 25.10.0 to 25.12.0, along with infrastructure updates to handle cuML's new dependency model.

Key technical changes include: adapting LogisticRegression to pass LBFGS parameters to the constructor instead of post-initialization and removing automatic objective computation; migrating KMeans from the deprecated class to KMeansMG; adding explicit installation of previously bundled dependencies (pylibraft, raft-dask) across all deployment scripts; updating model field initialization patterns for PCA, UMAP, and RandomForest to work with cuML's new internal structure; and enhancing test reliability by fixing DBSCAN tests that could fail in multi-GPU environments.

The PR maintains API compatibility for end users while adapting to significant internal changes in cuML 25.12, ensuring the spark-rapids-ml library continues to provide GPU-accelerated drop-in replacements for Spark ML algorithms.

Important Files Changed

Filename	Score	Overview
`python/src/spark_rapids_ml/classification.py`	4/5	Major LogisticRegression updates for cuML 25.12: constructor parameter changes, single-label exception handling, removed objective computation
`python/src/spark_rapids_ml/clustering.py`	4/5	Migrated KMeans from deprecated class to KMeansMG, removed multigpu parameter, updated model construction patterns
`python/src/spark_rapids_ml/regression.py`	4/5	Updated LinearRegression defaults, reorganized imports, fixed model field initialization for n_features_in_ and n_cols
`python/src/spark_rapids_ml/umap.py`	4/5	Updated UMAP model construction with new attribute names, parameter handling, and embedding array wrapping for cuML 25.12
`python/tests/test_logistic_regression.py`	4/5	Comprehensive test updates: new objective utility import, parameter validation changes, constructor updates, error handling changes
`notebooks/databricks/init-pip-cuda-12.sh`	4/5	CUDA upgrade to 12.2.2, RAPIDS 25.12.0 update, explicit dependency installation, numpy pinning for Databricks compatibility
`python/src/spark_rapids_ml/feature.py`	5/5	Fixed PCA dtype access, corrected typo, updated n_features_in_ initialization for cuML 25.12 compatibility
`python/src/spark_rapids_ml/metrics/utils.py`	5/5	New utility module implementing logistic regression objective calculation since cuML no longer computes it automatically
`python/src/spark_rapids_ml/tree.py`	5/5	Fixed RandomForest model reconstruction by switching from CuPy to NumPy arrays for classes_ attribute initialization
`python/tests/test_dbscan.py`	5/5	Improved multi-GPU test reliability by adding explicit ID fields and sorting operations for deterministic results
`python/src/spark_rapids_ml/connect_plugin.py`	5/5	Removed objective attribute from LogisticRegression serialization since cuML 25.12 no longer computes it
`docker/Dockerfile.pip`	5/5	Updated CUDA to12.2.2, RAPIDS to 25.12.0, added explicit pylibraft and raft-dask dependencies
`docker/Dockerfile.python`	5/5	CUDA upgrade, variable rename to RAPIDS_VERSION, explicit RAPIDS component installation
`ci/Dockerfile`	5/5	CI environment update: CUDA 12.2.2, RAPIDS 25.12, explicit dependency installation
`python/pyproject.toml`	5/5	Simple version bump from 25.10.0 to 25.12.0 in package metadata
`python/src/spark_rapids_ml/__init__.py`	5/5	Package version update from 25.10.0 to 25.12.0
`jvm/pom.xml`	5/5	Maven artifact version update from 25.10.0 to 25.12.0
`docs/source/conf.py`	5/5	Sphinx documentation version update to 25.12.0

Confidence score: 4/5

This PR requires careful review due to extensive breaking changes across multiple algorithms and infrastructure components
Score reflects the comprehensive nature of changes affecting LogisticRegression parameter handling, KMeans algorithm migration, dependency management, and model field initialization patterns that could impact runtime behavior
Pay close attention to classification.py, clustering.py, regression.py, and umap.py which contain complex logic changes affecting model training and inference

Sequence Diagram

sequenceDiagram
    participant Developer
    participant "Docker Build" as Docker
    participant "cuML 25.12" as cuML
    participant "Logistic Regression" as LogReg
    participant "Test Suite" as Tests
    participant "Documentation" as Docs

    Developer->>Docker: Update RAPIDS_VERSION to 25.12
    Docker->>Docker: Install cuml=25.12 cuvs=25.12 pylibraft=25.12
    Docker->>Docker: Update scipy minimum to 1.11
    
    Developer->>LogReg: Handle cuML breaking changes
    LogReg->>LogReg: Remove training objective computation
    LogReg->>LogReg: Handle single label value cases
    LogReg->>LogReg: Update lbfgs parameter handling
    LogReg->>cuML: Set model fields for inference
    
    Developer->>Tests: Fix DBSCAN multi-GPU issues
    Tests->>Tests: Update test configurations
    Tests->>Tests: Handle sparse data optimization
    
    Developer->>Docs: Update version references
    Docs->>Docs: Update notebook examples
    Docs->>Docs: Refresh API documentation
    
    Developer->>Tests: Run full test suite
    Tests->>cuML: Validate cuML 25.12 compatibility
    Tests->>LogReg: Verify logistic regression fixes
    Tests->>Tests: Confirm DBSCAN stability
    
    Tests-->>Developer: All tests passing
    Developer->>Developer: Finalize PR for 25.12 release

greptile-apps

Additional Comments (3)

python/benchmark/databricks/gpu_cluster_spec.sh, line 58 (link)

logic: The init script init-pip-cuda-13.0-nightly.sh does not exist in the repository. This will cause cluster creation to fail. The available script is init-pip-cuda-12.0.sh.
python/benchmark/databricks/gpu_etl_cluster_spec.sh, line 69 (link)

logic: The init script init-pip-cuda-12.0-nightly.sh does not exist in the repository. This will cause cluster creation to fail. The available script is init-pip-cuda-12.0.sh.
python/README.md, line 25 (link)

syntax: The $RAPIDS_VERSION shell variable is used but not defined in this context. This command won't work as documented.

_{32 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

eordentlich · 2025-12-09T03:02:10Z

build

…bricks scripts Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2025-12-09T04:17:51Z

build

greptile-apps

_{32 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

…ckerfile Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2025-12-09T04:58:17Z

build

greptile-apps

Additional Comments (2)

notebooks/databricks/init-pip-cuda-12.sh, line 46 (link)

logic: numpy~=1.0 constraint is too broad and will install numpy 1.x which could include very old incompatible versions. Should specify a more restrictive minimum version. What specific numpy version range is required for compatibility with RAPIDS 25.12 and Databricks 13.3?
python/README.md

logic: The $RAPIDS_VERSION variable is referenced but not defined in the conda command. Users will get an error unless this variable is set.

Should this use a specific version number like the other packages, or should there be instructions to define RAPIDS_VERSION first?

_{35 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2025-12-12T05:47:22Z

build

greptile-apps

Additional Comments (4)

python/README.md, line 25 (link)

style: The $RAPIDS_VERSION variable is used but not defined in this documentation - users may be confused about what value to substitute

Should the documentation include instructions on how to set or determine the appropriate RAPIDS_VERSION value?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
notebooks/databricks/init-pip-cuda-12.sh, line 46 (link)

style: numpy version constraint ~=1.0 is extremely permissive and may not address the scipy 1.11 compatibility issue mentioned in the PR description. Should this be a more specific numpy version constraint to ensure compatibility with scipy 1.11 requirements?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
python/src/spark_rapids_ml/metrics/utils.py, line 56 (link)

style: Use is True comparison is unnecessary - if lr_model.getStandardization(): would be more pythonic

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
python/src/spark_rapids_ml/regression.py, line 806 (link)

style: Setting n_cols twice appears redundant - line 801 and 806 both set the same attribute

Is the duplicate assignment on line 806 intentional for compatibility reasons, or should one of these be removed?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{36 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

python/README.md

notebooks/databricks/README.md

python/src/spark_rapids_ml/classification.py

python/README.md

python/src/spark_rapids_ml/regression.py

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

eordentlich · 2025-12-16T20:00:11Z

build

greptile-apps

Additional Comments (4)

python/benchmark/databricks/cpu_cluster_spec.sh, line 27 (link)

style: The spot_bid_price_percent parameter is now unused since availability is ON_DEMAND

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
notebooks/databricks/init-pip-cuda-12.sh, line 45 (link)

logic: numpy version constraint ~=1.0 seems overly permissive and may allow incompatible versions. Should this be a more restrictive constraint like numpy>=1.21,<2.0 to ensure compatibility with the RAPIDS ecosystem?
python/src/spark_rapids_ml/regression.py, line 809 (link)

style: Redundant assignment of lr.n_cols since it was already set on line 804

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
python/src/spark_rapids_ml/classification.py, line 1074-1091 (link)

style: Exception handling properly catches cuML's new single-label restriction, but the string matching approach using traceback.format_exc() is fragile and could break with cuML error message changes

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{36 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

rishic3

👍

default CUDA version has been updated in #997 Update image tag to use latest docker image to run CI. Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

eordentlich added 5 commits December 2, 2025 12:07

updates for 25.12 nightly

2320f58

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

use order internally expected by cuml for setting embedding_ field

7e712fe

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

formatting

c871824

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

cleanup, fixes for DB 13.3, 14.3 scipy issue, fix one label value com…

76db03a

…pat tests Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

cleanup

991c653

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps bot reviewed Dec 9, 2025

View reviewed changes

fix ci rapidsai-nightly conda channel, remove nightly from 25.12 data…

477cc3b

…bricks scripts Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps bot reviewed Dec 9, 2025

View reviewed changes

updates for cuda 12.2 as new minimum cuda 12 in rapids, delete old do…

ea79d74

…ckerfile Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps bot reviewed Dec 9, 2025

View reviewed changes

fix dbscan tests for 2+ gpus, cleanup, remove -nightly

5515712

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps bot reviewed Dec 12, 2025

View reviewed changes

eordentlich changed the title ~~updates for 25.12 nightly~~ updates for 25.12 Dec 12, 2025

rishic3 reviewed Dec 16, 2025

View reviewed changes

address comments

6efd601

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

rishic3 approved these changes Dec 16, 2025

View reviewed changes

eordentlich merged commit ddfa00b into NVIDIA:main Dec 16, 2025
4 checks passed

eordentlich deleted the eo_25.12_nightly branch December 16, 2025 21:03

YanxuanLiu mentioned this pull request Dec 17, 2025

Update image tag #999

Merged

YanxuanLiu added a commit that referenced this pull request Dec 17, 2025

Update image tag (#999)

abb9776

default CUDA version has been updated in #997 Update image tag to use latest docker image to run CI. Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>

Conversation

eordentlich commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 4/5

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (3)

Uh oh!

eordentlich commented Dec 9, 2025

Uh oh!

eordentlich commented Dec 9, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

eordentlich commented Dec 9, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

eordentlich commented Dec 12, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (4)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eordentlich commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (4)

Uh oh!

rishic3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eordentlich commented Dec 9, 2025 •

edited

Loading

greptile-apps bot commented Dec 9, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading